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Abstract 

We solve the cost-minimization problem posed by Ferrer i Cancho and Sole in their model 
of communication that aimed at explaining the origin of Zipf's law [PNAS 100, 788 (2003)]. 
Direct analysis shows that the minimum cost is min{A, 1 — A}, where A determines the relative 
weights of speaker's and hearer's costs in the total. The nature of the minimizing solution 
changes discontinuously at A = 1/2, being qualitatively different for A < 1/2, A > 1/2, and 
A = 1/2. Zipf's law is found only in a vanishing fraction of the minimum-cost solutions 
at A = 1/2 and therefore is not explained by this model. We also investigate the solutions 
reached by the previously used minimization algorithm and find that they correctly recover 
global minimum states at the transition. 
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I. INTRODUCTION 



Among the numerous empirically reported power-law distributions, one of the oldest 
and with best statistical support is Zipf's law, which states that the frequency Fk of 
the fc-th most frequent word decays as Ffc ~ C/P with \i ~ 1 jlj. While there are 
various stochastic models of text generation that reproduce this and other statistical 
features of corpora [2 - 4 ] , a definitive answer to the more fundamental question of why 
natural language shows Zipf's law is still lacking. Zipf argued that it is a consequence 
of the tendency of speakers and hearers to communicate with least effort [j| . Ferrer i 
Cancho and Sole recently proposed a quantitative model that builds on these ideas and 
suggests how natural language could have evolved to a state satisfying Zipf's law jsj. 
The importance of this work, which we revisit here, is that it introduced a framework 
of language games to explain Zipf's law which influenced many subsequent works \§- 
8[ and contributed to the current interest in modeling different aspects of language 
dynamics {9]. 

In the framework introduced in Ref. [sj, which fits into a more general modeling 



scheme of language evolution [10 



Ferrer i Cancho and Sole considered a scenario 
of "objects" reported to a listener by a speaker using a certain lexicon of symbols. The 
speaker's cost of communication, a, is related to the average information per symbol, 
and the listener's cost, 0, to the mean ambiguity of symbols. (A symbol is ambiguous if 
it denotes more than one object.) It was proposed that as the language evolves the cost 
function Q(X) = A/3 + (1 — \)a is minimized, with the parameter A G [0, 1]. Studying 
the minimization problem numerically, the authors assert that the model exhibits a 
phase transition at a certain value of A, at which Zipf's law is satisfied. Continuous 
phase transitions are related to power-law distributions (associated, for example, with 
long-range correlations and self-organized criticality) and their possible connection to 
Zipf's law is another appealing idea of Ref. (5)] that motivates our work. 

In this paper we show that the minimum cost in the language game proposed in 
Ref. j^] is simply min{A, 1 — A}. At A = 1/2 the nature of the minimum-cost state 
changes discontinuously and multiple minimum-cost states with varied properties co- 



exist. The states satisfying Zipf's law are found to be extremely rare, comprising a 
vanishing fraction of all minimum-cost states in the relevant limit of large number of 
symbols and objects. While these facts are demonstrated analytically, via direct calcu- 
lation, it is also of interest to examine the numerical minimization approaches to see if 
the observation of Zipf's law can arise from a failure to attain the minimum-cost state 
as speculated in Ref. Q). Using a stochastic algorithm along the lines proposed in 
5], we find that non-Zipfian minimum-cost states are attained for small systems, and 
that this trend continues with increasing system size (even though the final cost may 
be fractionally above the theoretical minimum owing to limited computer resources). 

The remainder of this paper is organized as follows. In Sec. Owe define the model in 
detail, in Sec. II III we compute the states minimizing the cost function f2(A), in Sec. IIVI 
we report the results of the simulations, and in Sec. |V] we summarize our conclusions. 

II. MODEL 

In this section we define the model proposed by Ferrer i Cancho and Sole using a 
notation and terminology that differs somewhat from that of Q], but which we believe 
facilitates the analysis. Consider the interaction between a "speaker" and a "listener" 
in a language consisting of n > 1 symbols, s±, .s n , used to describe a world of m > 1 



objects, r 
matrix A 
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The relation between symbols and objects is defined via a lexical 



11] : if symbol Si is used to designate object rj, then Aji = 1; otherwise 
this element is zero. The same symbol may be used to designate more than one object 
(in principle, all m objects could be designated by the same symbol), and several 
symbols may refer to the same object. By definition, each object is represented by at 
least one symbol, so that each row of A possesses at least one nonzero element. 

A key assumption of the model is that in the communication between speaker and 
listener, all objects occur with the same probability, so that p{rj) = 1/m for all j. 
Define a communication event as the occurrence of an object and the speaker reporting 
this to the listener. When the speaker refers to object r\,, she uses each of the symbols 
that refer to this object (i.e., those for which = 1) with equal likelihood. This 
implies that the probability of symbol Sj over the space of all possible communication 



events is 



m m \ m A ■ ■ 1 m 

■ i ■ 1 • 1 Z_/fr— 1 -'MA: ''^ • 1 

3 — I J= l j-x i—iK—i. J J= x 

where we have introduced matrix B, obtained from A by dividing the elements of each 
row by the corresponding row sum. (Thus the row sums of B are all unity.) 

The speaker chooses among n symbols, from a probability distribution p(si). Fol- 
lowing Shannon [l2], the mean information per symbol may therefore be defined as 

n 1 n 

H n (S) = - YVsi) \og n p(si) = — YVsi) lnp(si) = a (2) 

z — ' Inn z — ' 

i=i i=i 

Note that the use of log n imposes the condition < a < 1. Ferrer i Cancho and Sole 
interpret this quantity as the speaker's cost in communicating. It is zero when only 
one symbol is used (an impoverished language indeed!) and unity when all n symbols 
have the same probability. 

The listener's cost, (3, is related to ambiguity; when each symbol refers to a unique 
object, there is no ambiguity and — 0. To define the listener's cost in the presence 
of ambiguity, we begin by defining the cost in interpreting symbol sf. 

m 

H m (K\si) = - ^p{rj\si) \og m p(r j \s i ), (3) 
i=i 

where the conditional probability of object rj, given reception of symbol Sj, is 

P(rM) = tHpi'ifo) = -^r = = C * ( 4 ) 

P{Si) mp(si) }^ k=1 B ki 

In the final equality we have defined C as the matrix obtained from B by dividing each 
element in column k by the corresponding column sum, so that each column sum in C 
is unity. Thus, 



1 m 

H m {n\ Sl ) = — — YcyMc,,. (5) 

In m 

i=i 

The cost per symbol to the listener is then defined as 
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H m (K\S) = J2p{ Si )H m {n\ Si ) = (3. (6) 



i=l 



Evidently, [3 is also restricted to [0,1]. Both a and (3 are invariant under permutations 
of the n symbols, and of the m objects. 

Ferrer i Cancho and Sole define the total cost as the linear combination: 



fl(A) = A/3 + (1 - A)a. (7) 

Small values of the parameter A place a larger emphasis on the speaker's cost and 
vice-versa. The authors of j^J study the problem of minimizing Q(A) numerically, and 
report that at a certain critical value, A c ~ 0.41, a phase transition occurs, at which 
certain properties such as the effective lexicon size, change in a singular manner. (In 
Ref. [3] this conclusion was revised to reflect that the transition actually occurs at 
A = 1/2.) 



III. MINIMUM COST STATES 



In this section we determine the set of matrices A that minimize 0(A) . To begin, 
consider two very simple cases. First, suppose = 5a, i.e., all elements in column 1 
equal to one, and all others zero. (Clearly, any column will do.) As noted previously, 
a = in this case, while /3 = 1. Next, consider Aji = Sjf, now (3 = and a = 1. (Again, 
any permutation of the rows yields a matrix with the same costs.) If we restrict A to be 
in one of these two families, then for A < 1/2 it is the first kind (all nonzero elements 
in the same column) which minimizes Q, while for A > 1/2 the second kind does the 
job. We then have 



min fi(A) = min{A, 1 - A}, (8) 

which is indeed singular at A = 1/2. 

There are, of course, many other possible matrices A allowed in the model. Can any 
of these improve on Eq. (JSJ)? The matrices considered above satisfy a + j3 = 1. If we 



can show that a + > 1 for any matrix A, then improvement is impossible. To see 
this, note that a + (3 > 1 implies that 



and 



A/3 + (1 - X)a > A/3 + (1 - A)(l - (3) 

= (2A-l)/3 + l-A 

> 1 - A, for A > 1/2, 

X(3 + (1 - A) a > A(l - a) + (1 - A)a 

= (1 - 2A)a + A 

> A, for A < 1/2. 



(9) 



(10) 



Thus, if we can prove the inequality a + /3 > 1, we will have shown that no matrix can 
improve on Eq. (jSJ). 

It is not difficult to demonstrate the inequality. Let B be an n x n matrix with the 
following properties: 

(i) B 3i > 0; 

(ii) each row contains a nonzero element; 

(iii) each row sum is unity: X^"=i — 1- 

Define C as the matrix obtained from B by dividing the elements in each column by 
the corresponding column sum. For each i — 1, ...,n, let 



Pi 



1 n 

— y . B ki . 



fc=i 



Then > 0, and XliLiP* = 1 by property (iii). Define 



a 



1 n // 



1 ™ 



i=i 



nlnn 



i=i \k=i 

n / ii 



In ^ B fi - In 



n 



nlnn 



8=1 \fc=l 



£=1 



(12) 



Next, let 



Inn ^-f VZ^fc-i 5 fr 

and define /3 = Y^7=iPi^- Using the expressions above for pi and hi, we find 



(13) 



so that 



n In 



nlnn 



i=l \r=l 
n n 



nlnn 



n n 



j=i i=i 
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ln^-ln^B, 



In i?^ - In 5« 



(14) 



- yt yt 

i=i t=i 

For each j, J^r=i Bji = 1, and since the -Bjj are nonnegative, — Y^i=i Bji ^ n > 0. ■ 
Thus Eq. (jSJ) represents the global minimum of f2(A). At A = 1/2, there is a "phase 
transition" , or better, a change in the nature of the ground state, at which the number 
L of words used jumps from 1 to n. (In this sense, the transition is discontinuous.) 

There are n matrices A for which a = 0: those in which all nonzero elements fall 
in the same column. There are, on the other hand, n! matrices such that /3 = 0, i.e., 
any row-permutation of the identity matrix. For a matrix A to satisfy a + (3 = 1, it 
must have one and only one nonzero element in each row. (This follows from Eq. [15j 
if any row sum were greater than unity, the nonzero elements in the corresponding row 
of B would be smaller than unity, making a + (3 strictly greater than 1.) The number 
of matrices that minimize 0(A = 1/2) is therefore n n . Thus the multiplicity of the 
minimum-cost state is different for A< 1/2, A> 1/2, and A = 1/2. 

Having determined the minimum-cost states, we turn to the associated rank- 
frequency relation. For a given matrix A, rank the symbols in order of decreasing 
probability, and let P(k) be the probability of the fc-th symbol in the ranking. Of prin- 
cipal interest is the mean, (P(k))\, over all matrices minimizing Q(A). For A < 1/2, 
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P(k) = 5k,i while for A > 1/2, there is no "ranking" as each symbol has the same prob- 
ability, 1/n. For A = 1/2 and n large, the numbers of nonzero elements in columns 1, 
2,..., n, are essentially independent, identically distributed Poisson random variables 
with parameter 1. (P(/c))a=i/2 is readily estimated via simulation; the result, shown in 
Fig. [1] (left panel), bears little resemblance to a power law. The simulation consists in 
generating a set of n independent Poisson deviates with parameter 1, sorting them into 
decreasing order, Xn\ Xn\...,xi k \..., and saving the values of P(k) = Xn /n in a 
given realization. (P(k))\ = i/ 2 is the average over many such independent realizations. 
The values of (Xn ) can be calculated exactly for k = 1, 2, and 3 (see Appendix); 
the result agrees with simulation, as shown in Fig. CD (right panel); (Xn^) is seen to 
increase very slowly with n. 
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FIG. 1: Left: mean symbol frequency (Xn K> ) versus rank k for A = 1/2, obtained via simulation, for 
(lower) n = 1000, and (upper) n = 20 000. Data are averages over 2 x 10 5 realizations. Right: Mean 
symbol frequency versus system size n for (upper to lower) k = 1, 2, and 3, for A = 1/2, obtained 
using the exact expressions (curves, see Appendix) and via simulation (points). 

The mean lexicon size at A = 1/2 is n times the probability P(X > 0), where X is 
a Poisson random variable (RV) with parameter 1. Thus for large n and A = 1/2, we 
have (L)/n = 1 — e _1 ~ 0.6321. The symbol probabilities follow p$ = m/n, where m is 
again a Poisson RV with distribution P m = 1/ (em!) for m — 0, 1, 2, n. The expected 
number of symbols having probability m/n is nP m , so that for large n, the speaker's 
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cost is 



a ~ 



In 



n ' em\ 




m 



n 



) 



m=0 



1 



Inn' 



(16) 



where 



1 oo 

l x 

m=2 



mlnm 



ml 



(17) 



is found numerically to be 0.5734028... At the transition then, the mean value of the 
speaker's cost tends slowly to unity as the number of objects n tends to infinity. 

We have seen that the rank-frequency distribution does not follow a power law 
when we average over the set of lexical matrices minimizing Q at A = 1/2. Next we 
examine the likelihood that a minimum-cost matrix (i.e., one for which a + /3 = 1) 
follows Zipf 's law. A small fraction of the minimum-cost matrices do in fact follow 
a Zipf distribution; we call these Z-matrices. One such matrix can be constructed 
as follows. In column 1, let the first f\ elements be unity and the remainder zero; 
then in column 2, let the elements in rows f\ + 1 up to f± + f\/2 be unity, and all 
others zero. Proceed in this manner until /i columns have been populated with fx, 
f2 = [/i/2],...,/j = [fi/j],---, ffr = 1 nonzero elements, leaving the remainder of the 
columns with only zeros. (Here [...] denotes the largest integer of its argument.) 
By construction, the symbol frequencies follow a Zipf distribution. The number n of 
objects is approximately 



where 7 ~ 0.5772 denotes the Euler-Mascheroni constant. 

The number Nz{n) of Z-matrices is given by the number of choices of columns 
and rows. Assuming that the fj are all distinct, we have n\/{n — fi)\ choices for the 
columns. (Since there will, in general, be several columns with only one, or two, etc., 
nonzero elements, this is actually an overestimate.) Independent of the column permu- 
tations, we may permute the rows; the number of such permutations is approximately 




(18) 



n\/[fi\ (A/2)! (/i/3)! • ■ • 2! 1!]. Using Stirling's formula one finds, 







\nN z (n) ~ 2n\nn - (n - }\) ln(n - /i) - /i -nln/i + (19) 

where 

x, = ±¥- < 20 > 

i=i J 

Since the number NmcI 71 ) oi n x n matrices with a + /3 = 1 is n n , the fraction 
represented by Z-matrices is extremely small. For a Zipf distribution of very modest 
length, fx = 30, one has n = 120 and Nz/Nmc — 10 -41 . Increasing /i to 100 implies 
n = 519, and the ratio becomes of order 10~ 313 ! It is clear that this tendency will not 
change even if we relax our requirements to consider a matrix compatible with Zipf's 
law (e.g., by allowing the Zipf exponent to be \i ^ 1, or by allowing small fluctuations 
in the fk about a strict power law). Thus, we conclude that any reasonably sized Zipf 
distribution has essentially zero probability of appearing in the set of minimum-cost 
matrices at A = 1/2. 



IV. SIMULATION 



Our aim in this section is to determine whether a minimum-search algorithm is able 
to attain the minimum-cost states identified above. (Note that the simulations reported 
in the preceding section do not involve searching for the minimum cost; they are merely 
used to estimate the rank-frequency relation for a set of independent Poisson RVs.) We 
apply a Monte Carlo algorithm along the lines of j^j to the minimization of f2(A): with 
probability v an element of the lexical matrix A is flipped. If the resulting cost is lower, 
the flip is accepted, otherwise it is rejected. (Naturally, flips from 1 to that would 
leave a row with all elements zero are also rejected.) This procedure is repeated a large 
number of times (of the order of 10 7 Monte Carlo steps per matrix element). A flipping 
probability v = A/(n(n — 1)) is used as in j^]. To track the evolution of Q(A) over time 
we initialise the matrix A by setting all elements equal to 1. Typical evolutions are 
shown in Fig. [2] for A = 1/2 and n = 300. 

Clearly, little variation is seen among the 100 simulations. Indeed, the ensemble 
average (Q(l/2)) = 0.5014(9), where the last figure in brackets denotes the ensemble 
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FIG. 2: Evolution of costs for 100 independent simulations, for A = 1/2 and n = 300. Time 
is measured in Monte Carlo steps per matrix element. The horizontal lines at heights 1/2 
and 1 indicate the values theoretically minimising f2(A) and a + f3, respectively. 

standard deviation. Meanwhile, the speaker's cost is found to be (a) = 0.902(6), as 
compared to a = 0.8995 obtained from Eq. ( JTB"]) for n = 300. Thus the algorithm 
comes very close to the theoretical minimum 0(1/2) = 1/2; the histogram for the 
symbol frequency in Fig. |3] for n = 300 (black line) is very similar to that obtained 
via the Poissonian statistics approach of the preceding section (red line), as well as to 
Fig. 3C in fl, for n = 150. 

We note that the performance of the algorithm deteriorates for A < 1/2. Thus, 
while it is possible to fully minimize the cost at A = 1/2 within a reasonable amount 
of time, the same is not true for A < 1/2, in which case the resulting distributions of 
P (k) are sensitive to details such as the initial condition for A and the amount of time 
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FIG. 3: (Black line) Symbol frequency P(k) versus rank k for A = 0.5 and n = 300, averaged 
over 100 simulations, using a spin-flipping algorithm along the lines of (Red line) P(k) 
obtained from Poissonian statistics as described in Sec. IIII1 

the simulation is allowed to run. 

V. CONCLUSIONS 

In summary, we have shown that the minimization problem associated with the 
Ferrer i Cancho-Sole model is amenable to exact analysis, which yields minf2(A) = 
min{A, 1 — A}. While this expression and other aspects of the minimum-cost state are 
singular at A = 1/2, we find no evidence for the power-law frequency-rank distribution 
reported in js|. Minimizing the cost only yields Zipf's law for a small fraction of minima 
at A = 1/2, which vanishes for the relevant case of increasing system size. 
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Given these results, one might suspect that the Zipf-like distribution is associated 
with sub-optimal states, with costs slightly greater than the minimum. Our simula- 
tions, however, do not yield a power-law distribution in this slightly sub-optimal situa- 
tion, casting doubt on whether Zipf 's law can be explained in this manner. Some of the 

Q 

results reported here were obtained previously through different techniques in Ref. [7j, 
which showed that the minima of Q(A) are degenerate, and that for < A < 1/2 the 
global minimum corresponds to a single symbol (Q = A). These authors also spec- 
ulated "that Zipf's law ... could be the consequence of local minima ofQ(X)". Here 
we obtained analytical results for 1/2 < A < 1 as well (Q = 1 — A), showed that the 
transition happens exactly at A = 1/2 and is characterized by a large multiplicity of 
states, and showed that the numerical simulations do not yield Zipf's law. 

Finally, it is interesting to speculate about which alterations of the model of Ref 
could lead to Zipf's law. While models with substantial modifications have been in- 

fin 

vestigated |6|-|8|], our finding of states compatible with Zipf's law at A = 1/2 suggests 
that small modifications of the model might be sufficient to break the degeneracy, such 
that a power-law distribution would correspond to the global minimum cost. It is 
also worth considering the possibility that the evolution of language cannot reach the 
minimum-cost states on historic time scales, and instead wanders in a space of sub- 
optimal configurations, for which Zipf's principle does hold to good approximation. 
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Appendix: Expectation of the maximum of n independent random variables. 



Consider an integer-valued random variable Y with probability distribution p m = 
Prob[F = m], and let q m = Prob[F < m\. Let Y"i,..., Y n , be a set of n independent 

(k) 

random variables drawn from this distribution, and let Xn denote the fc-th largest 
variable in this set. The event Xn^ = m corresponds to having one or more of the Y{ 



1 3 



equal to m, and all others smaller. Thus 



Prob[X« =m]=J2 ( n )pLC--\ = C - C-i, (21) 
j= i V/ 

and (X^) is given by the (conditionally convergent) sum 

oo 

(^ = E*-tJ. (22) 

m=0 

For the Poisson distribution with parameter unity, p m = l/(em!), the above sum 
converges quite rapidly, with the contribution due to terms with m > 20 being negli- 
gible for the n values considered here. Figure 2 shows that the simulations of Sec. Ill 
are in good agreement with our analysis. 

The means of the second and third largest variables can be obtained using, 

Prob[Xf =m}= Prob[X« = m] + n {{l-q m )[q^ 1 - q^_\] - p m C-i} (23) 

and 

Prob[X^ 3) =m] = Prob[X^ 2) =m] 

+ ^ {(1-U 2 [?r 2 - C-x] -P m <T m -A\Pm + 2(l-g m )]} (24) 

Analogous formulas can of course be derived for the fourth and subsequent variables, 
though they become increasingly more complicated. We have verified that the simu- 
lations agree with the exact expressions for the means of Xn \ Xn \ and Xn\ For 
example, for n = 1000 we have (X^) = 5.51384, 5.00381, and 4.73083 for k = 1, 2, 
and 3, respectively, while simulation yields 5.5136(3), 5.0040(4) and 4.7308(6), with the 
figures in parentheses denoting statistical uncertainties. 
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